There are many replication files accompanying this study because there
are a number of different simulations and dataset. The GenMatch runs
are computationally intensive so there are a number of intermediate
files.  The simulations, in particular, take a long time to run. So
some of the simulations have more than one output file for a given
condition (with simulation replications split across the
files). 

Please contact Jas Sekhon <sekhon@berkeley.edu> if you have any
questions or need help with replication.

The code was run on Linux (ubuntu) with R 2.12, Matching 5.7-1, and
rgenoud 4.7-10. The header files of each output report the version of
R, Matching, and genoud used. An important compatibility issue is that
the code that uses the 'AutoCluster3.R' file will only work with
versions of R that have the 'snow' package. Newer version of R, 3+,
require the use of the 'AutoCluster4.R' file instead, which is also
included in this replication archive. Niether file is likely to run on
a Windows machine (but hacked versions of 'snow' do exist for
Windows). If neither 'snow' nor 'parallel' is available, one could
simply edit the code to not use multiple cores.

Note that the algorithm is stochastic so small differences are to be
expected. Also note that the optimization software has improved over
time. 

Files:

Simulation Study 1 Tables and Figures

The simulation Data is created by data1.obs1000.reps1000.R (+Rout
denotes the output file with the same base name, with ".Rout" appended
instead of just ".R"). This file depends on the dataset_generate.R
file from Brian Lee (also included in the
archive). data1.obs1000.reps1000.R file created the RData files that
the other files use. So you have to run this file first. It will
create: RData.simdata.A.obs1000.reps1000,
RData.simdata.B.obs1000.reps1000, RData.simdata.C.obs1000.reps1000,
RData.simdata.D.obs1000.reps1000, RData.simdata.E.obs1000.reps1000,
RData.simdata.F.obs1000.reps1000, RData.simdata.G.obs1000.reps1000

The balance3.combine1.nobs1000.R (+Rout) file combines the simulation
results. It generated the following plots:
balance3_combine1_nobs1000_expA.pdf,
balance3_combine1_nobs1000_expB.pdf,
balance3_combine1_nobs1000_expC.pdf,
balance3_combine1_nobs1000_expD.pdf,
balance3_combine1_nobs1000_expE.pdf,
balance3_combine1_nobs1000_expF.pdf,
balance3_combine1_nobs1000_expG.pdf

balance3.combine1.nobs1000.R (+Rout) depends on the following Data sets which are contain the actual results from the simulations:
balance3.simsA1.nobs1000.Rdata created by balance3.simsA1.nobs1000.R (+Rout) which depends on: simsA1.nobs1000.R (+Rout) and simsA1.nobs100.gm1.results
balance3.simsB1.nobs1000.Rdata created by balance3.simsB1.nobs1000.R (+Rout) which depends on: simsB1.nobs1000.R (+Rout) and simsB1.nobs100.gm1.results
balance3.simsC1.nobs1000.Rdata created by balance3.simsC1.nobs1000.R (+Rout) which depends on: simsC1.nobs1000.R (+Rout) and simsC1.nobs100.gm1.results
balance3.simsD1.nobs1000.Rdata created by balance3.simsD1.nobs1000.R (+Rout) which depends on: simsD1.nobs1000.R (+Rout) and simsD1.nobs100.gm1.results
balance3.simsE1.nobs1000.Rdata created by balance3.simsE1.nobs1000.R (+Rout) which depends on: simsE1.nobs1000.R (+Rout) and simsE1.nobs100.gm1.results
balance3.simsF1.nobs1000.Rdata created by balance3.simsF1.nobs1000.R (+Rout) which depends on: simsF1.nobs1000.R (+Rout) and simsF1.nobs100.gm1.results
balance3.simsG1.nobs1000.Rdata created by balance3.simsG1.nobs1000.R (+Rout) which depends on: simsG1.nobs1000.R (+Rout) and simsG1.nobs100.gm1.results

Simulation Study 2

The data file for this study is RData.mc2.data1, and it is created by
mc2.data1.R (+Rout). You have to run mc2.data1.R before any of the
other files in order to create the needed data file. The non-GenMatch
results are generated by mc2.machine1.R (+Rout). The GenMatch results
are generated by the following files. The files are the same, but the
algoirthm is stochastic so these give an idea of the range of
experimental results that are expected: mc2.gm1.R (+Rout), mc2.gm1b.R
(+Rout), mc2.gm3.R (+Rout).


Data Example:

The simplist files here are for the Early RA results. We suggest that
one start replicationw with that dataset.

The 'ec675_nsw.dta' contains the three experimental datasets: the
Lalonde, DW, and early RA samples. They are indexed by the lalonde.dta,
dw.dta, and early.ra indices. Thevariables are:

"treat": treatment
"age": age
"education": years of education
"black": indicator for being black
"hispan": indicator for being Hispanic
"married": indicator for being married
"nodegree": indicator for not having a high school degree
"re74": real earnings in 1974
"re75"" real earnings in 1975
"re78": real earnings in 1978

The PSID data files:
no re74: rep.sourcecode.nore74psid1.RData
with re74: rep.sourcecode.re74psid1.RData

The CPS data files:
with re74: rep.sourcecode.re74cps1.RData also DW.cps1.re74.RData (the object 'foo' in rep.sourcecode.re74cps1.RData is the same as DW.cps1.re74.RData)
without re74: rep.sourcecode.nore74cps1.RData

Early RA results:
CPS1: earlyRA1.cps1.gm2.R (+Rout): 
psid: earlyRA1.psid1.gm2.R (+Rout) 

DW subsample:
psid: easy version: psid1.re74.gm4.triton1.engine1.R (+Rout). 
Optimizaiton version (loop truncated):
psid1.re74.gm4.triton1.R (+Rout)

cps: 
run_replicate.GenMatch.cps1.cluster1.popsize5000.wg20.R (+Rout)
calls replicate.GenMatch.cps1.cluster1.popsize5000.wg20.R

Lalonde subsample:
cps: cps1.nore74.R (+Rout)
psid: psid1.nore74.gm4.triton1.R (+Rout) 
